file name
How to Go Paperless in 9 Steps
Has Your Pledge to Go Paperless Perished? You promised yourself you'd digitize every last receipt, document, and paper record. But the trick to getting rid of paper is to not worry about being perfect. Wanting to get rid of paper in your life is easy. Following through with that promise to yourself is hard.
Repository-Aware File Path Retrieval via Fine-Tuned LLMs
Yanuganti, Vasudha, Puri, Ishaan, Chhatre, Swapnil, Singh, Mantinder, Jallepalli, Ashok, Shrivastava, Hritvik, Sharma, Pradeep Kumar
Modern codebases make it hard for developers and AI coding assistants to find the right source files when answering questions like "How does this feature work?" or "Where was the bug introduced?" Traditional code search (keyword or IR based) often misses semantic context and cross file links, while large language models (LLMs) understand natural language but lack repository specific detail. We present a method for file path retrieval that fine tunes a strong LLM (Qwen3-8B) with QLoRA and Unsloth optimizations to predict relevant file paths directly from a natural language query. To build training data, we introduce six code aware strategies that use abstract syntax tree (AST) structure and repository content to generate realistic question-answer pairs, where answers are sets of file paths. The strategies range from single file prompts to hierarchical repository summaries, providing broad coverage. We fine tune on Python projects including Flask, Click, Jinja, FastAPI, and PyTorch, and obtain high retrieval accuracy: up to 91\% exact match and 93\% recall on held out queries, clearly beating single strategy training. On a large codebase like PyTorch (about 4,000 Python files), the model reaches 59\% recall, showing scalability. We analyze how multi level code signals help the LLM reason over cross file context and discuss dataset design, limits (for example, context length in very large repos), and future integration of retrieval with LLM based code intelligence.
code_transformed: The Influence of Large Language Models on Code
Xu, Yuliang, Huang, Siming, Geng, Mingmeng, Wan, Yao, Shi, Xuanhua, Chen, Dongping
Coding remains one of the most fundamental modes of interaction between humans and machines. With the rapid advancement of Large Language Models (LLMs), code generation capabilities have begun to significantly reshape programming practices. This development prompts a central question: Have LLMs transformed code style, and how can such transformation be characterized? In this paper, we present a pioneering study that investigates the impact of LLMs on code style, with a focus on naming conventions, complexity, maintainability, and similarity. By analyzing code from over 19,000 GitHub repositories linked to arXiv papers published between 2020 and 2025, we identify measurable trends in the evolution of coding style that align with characteristics of LLM-generated code. For instance, the proportion of snake\_case variable names in Python code increased from 47% in Q1 2023 to 51% in Q1 2025. Furthermore, we investigate how LLMs approach algorithmic problems by examining their reasoning processes. Given the diversity of LLMs and usage scenarios, among other factors, it is difficult or even impossible to precisely estimate the proportion of code generated or assisted by LLMs. Our experimental results provide the first large-scale empirical evidence that LLMs affect real-world programming style.
Otter: Generating Tests from Issues to Validate SWE Patches
Ahmed, Toufique, Ganhotra, Jatin, Pan, Rangeet, Shinnar, Avraham, Sinha, Saurabh, Hirzel, Martin
While there has been plenty of work on generating tests from existing code, there has been limited work on generating tests from issues. A correct test must validate the code patch that resolves the issue. In this work, we focus on the scenario where the code patch does not exist yet. This approach supports two major use-cases. First, it supports TDD (test-driven development), the discipline of "test first, write code later" that has well-documented benefits for human software engineers. Second, it also validates SWE (software engineering) agents, which generate code patches for resolving issues. This paper introduces Otter, an LLM-based solution for generating tests from issues. Otter augments LLMs with rule-based analysis to check and repair their outputs, and introduces a novel self-reflective action planning stage. Experiments show Otter outperforming state-of-the-art systems for generating tests from issues, in addition to enhancing systems that generate patches from issues. We hope that Otter helps make developers more productive at resolving issues and leads to more robust, well-tested code.
Document Type Classification using File Names
Li, Zhijian, Larson, Stefan, Leach, Kevin
Rapid document classification is critical in several time-sensitive applications like digital forensics and large-scale media classification. Traditional approaches that rely on heavy-duty deep learning models fall short due to high inference times over vast input datasets and computational resources associated with analyzing whole documents. In this paper, we present a method using lightweight supervised learning models, combined with a TF-IDF feature extraction-based tokenization method, to accurately and efficiently classify documents based solely on file names that substantially reduces inference time. This approach can distinguish ambiguous file names from the indicative file names through confidence scores and through using a negative class representing ambiguous file names. Our results indicate that file name classifiers can process more than 80% of the in-scope data with 96.7% accuracy when tested on a dataset with a large portion of out-of-scope data with respect to the training dataset while being 442.43x faster than more complex models such as DiT. Our method offers a crucial solution for efficiently processing vast datasets in critical scenarios, enabling fast, more reliable document classification.
Emotion Detection using Python - Geeky Humans
In this tutorial, we'll see how we can create a python program that will detect emotion on a human face. This might be interesting if you want to do things like emotion detection using python, or if you're training machine learning systems to read human emotions. We're going to create a program that takes an image as an input and outputs a list of human emotions that the image invokes. To do this, we're going to use a package called Deepface. Deepface is an open-source face recognition attribute analysis framework that was created for python.
Machine Learning
For processing the data we need some packages. Then we need a package to store our data. Alright, now that we have our packages, let us create a variable for our two paths, one for the "all" folder and another for the "hem" folder": Now, we need to point Python toward these folders(path variables above) and store the file names within them as a list. These two lines give you two lists, one for the "all" images and another for the "hem" images: Alright, now it is time to store the data from these images. The first line in the above code creates an empty data frame.
BRIEF: Everything We Know About 1970s Mainframe RPGs We Can No Longer Play
A PLATO terminal in a museum case at the University of Illinois; photo taken by the author in 2013. This entry summarizes a series of 1970s mainframe games that have been so lost we don't even have screenshots. I also asked several dozen PLATO authors, administrators, and former CRPG Addict contributors--everyone I could find--for any additional recollections about the games. I stopped only when I was confident there was nothing left to learn. If you have any new or conflicting information about any of the games below, I welcome your comments below or an e-mail to crpgaddict@gmail.com. I will update the information below with any new material discovered. However, please do not take it upon yourself to try to track down and contact any of the people listed here on my behalf; it is likely that I have already reached out and they either declined to respond or already told me all they could. Except for Don Daglow's Dungeon, all the games listed below were written in a language called TUTOR for the PLATO educational mainframe hosted by the University of Illinois Urbana-Champaign. Many of the games written on this system have been preserved and are playable today at Cyber1.
6 Python Projects You Can Finish in a Weekend
Learning Python can be difficult. You might spend a lot of time watching videos and reading books; however, if you can't put all the concepts learned into practice, that time will be wasted. This is why you should get your hands dirty with Python projects. A project will help you bring together everything you've learned, stay motivated, build a portfolio and come up with ways of approaching problems and solving them with code. In this article, I listed some projects that helped me level up my Python code and hopefully will help you too.
6 Python Projects You Can Finish in a Weekend
Learning Python can be difficult. You might spend a lot of time watching videos and reading books; however, if you can't put all the concepts learned into practice, that time will be wasted. This is why you should get your hands dirty with Python projects. A project will help you bring together everything you've learned, stay motivated, build a portfolio and come up with ways of approaching problems and solving them with code. In this article, I listed some projects that helped me level up my Python code and hopefully will help you too.